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LOCAL ALIGNMENT OF MARKOV CHAINS 

By Niels Richard Hansen 

University of Copenhagen 

We consider local alignments without gaps of two independent 
Markov chains from a finite alphabet, and we derive sufficient con- 
ditions for the number of essentially different local alignments with 
a score exceeding a high threshold to be asymptotically Poisson dis- 
tributed. From the Poisson approximation a Gumbel approximation 
of the maximal local alignment score is obtained. The results ex- 
tend those obtained by Dembo, Karlin and Zeitouni [Ann. Probab. 
22 (1994) 2022-2039] for independent sequences of i.i.d. variables. 

1. Introduction. Local alignment of two biological sequences (DNA-mole- 
cules or proteins) is one of the most important and used tools in modern 
molecular biology for locating highly similar contiguous parts of the se- 
quences. High similarity is usually interpreted as an evolutionary or func- 
tional relationship between the molecules. We show how the distribution 
of local alignment similarity scores behaves asymptotically when aligning 
independent Markov chains. 

It is important to understand the distribution of local alignment scores 
for assessing the significance of, for example, the maximally scoring local 
alignment. Formally this is a test of the null hypothesis that two sequences 
are independent Markov chains against a somewhat unspecified alternative 
that they are not independent. The test statistic considered is the maximal 
local similarity score. 

Usually when considering local alignments we are interested in not only 
the maximally scoring local alignment but also other essentially different 
local alignments that reach a score above a given threshold. It is therefore 
useful also to know the distribution of the number of local alignments of 
independent Markov chains that reach a score above a given threshold. In 
fact, it is this problem that we handle in the first place and the obtained 



Received November 2004; revised September 2005. 
AMS 2000 subject classifications. Primary 60G70; secondary 60F10. 
Key words and phrases. Chen-Stein method, extreme value theory, large deviations, 
local alignment, Markov additive processes, Markov chains, Poisson approximation. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Applied Probability, 
2006, Vol. 16, No. 3, 1262-1296. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



N. R. HANSEN 



Poisson approximation can easily be turned into a Gumbel approximation 
of the distribution of the maximal local alignment score. 

The kind of local alignment we consider is gapless local alignment mean- 
ing that we search for (contiguous) parts of the two sequences that attain 
a high similarity when matched letter by letter. Similarity is measured by 
adding up a score for each pair of matched letters. In practice it is common 
to allow for the insertion of gaps in the sequences — each gap adding a suit- 
able penalty to the similarity score — which usually increases the power of the 
test. The introduction of gaps does, however, make the problem of under- 
standing the asymptotic distribution of local alignment scores substantially 
more complicated although progress for i.i.d. sequences has been made more 
recently; see [4, 10, 17, 18]. In another direction, exact distributional results 
for i.i.d. sequences can be obtained if "shifting" is not allowed and if the 
scores are integer valued; see [14]. This work has also been generalized to 
Markov sequences; see [15]. 

The main result is stated as Theorem 3.1. It says that if the expected 
similarity score under the null hypothesis is negative, then there exist con- 
stants 9* , K* > such that if we let s denote the maximal local alignment 
score obtained when aligning two independent Markov chains of length n, 
then the normalized score defined by 



approximately follows a Gumbel distribution for n — ► oo. Moreover, the num- 
ber of normalized local alignment scores exceeding the threshold x is approx- 
imately Poisson distributed with mean exp(— x) for n — > oo. We have ignored 
some details and there are certain assumptions that need to be fulfilled for 
this to be a mathematically rigorous statement. We refer to Theorem 3.1 
and its prerequisites. 

It should be mentioned that the results are the expected generalizations of 
those obtained in [6] for independent i.i.d. sequences, but the techniques of 
proof are not straightforward generalizations. Indeed, this author would like 
to emphasize the novelty of certain techniques developed in this paper. In 
particular the results achieved in Sections 5.4 and 5.5 may be of independent 
interest. Moreover, the framework of Markov chains does not only provide a 
change of the null hypothesis but it also opens up the possibility of choosing 
new types of score functions as we discuss in Remark 3.5. This can increase 
the power of the test. In addition, by expanding the state space suitably the 
results obtained in this paper also cover null hypotheses where the aligned 
sequences have a higher-order Markov dependency or come from a hidden 
Markov model. 
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2. Local gapless alignment. Let (X n ) n >i and (Y n ) n >i be two sequences 
of random variables taking values in a finite set E. We compare parts of one 
sequence with parts of the other using a score function / : E x E —* Z, and 
we define the random variables 

<5 

fc=l 

for S > 0. The variable Sf j is the local score for the local comparison of 
the sequence part X i+ i ■ ■ ■ X i+ s with the sequence part Yj + i ■ ■ ■ Yj+s- 

We make the assumption that / takes integer values to emphasize the 
lattice nature of / that is often met in practice. To assure that Z indeed 
is the minimal lattice, the greatest common divisor of the integers f(x,y), 
x,y G E, is assumed to be 1. The results obtained are valid if / takes real, 
nonlattice values in a slightly modified form; see Remark 3.2. 

The score function can be regarded as an E x E matrix, which is conve- 
nient when writing down the values f(x,y). We will find it most useful to 
simply regard / as an element in a vector space. Probability measures will 
then be regarded as elements in the dual space and we use the functional 
notation 

to denote the mean of / evaluated under the probability measure v. 
For n > 1 define 

Hn = {{i, j, 6) |0 < i < i + S < n, < j < j + 5 < n}. 

The elements (i,j,5) € 7i n are called alignments. 

We want to understand the distribution of the collection 

( S lj)(i,j,S)eH n 

of local scores over all alignments. We will in particular be interested in the 
distribution of 

(2) M n = max Sfj, 

{l,3,S)eHn ,J 

the maximal local score over the set of alignments. We will also study the 
number, C n (t), say, of essentially different variables Sfj in Ti n exceeding 
some threshold t > 0. What we mean by "essentially different" is defined 
precisely below. 

The local scores are efficiently summarized in the score matrix (Jij)o<i i j< n , 
which is defined as follows. For i = or j = let Tij = and define recur- 
sively 

(3) Tij = max{T i _ 1J _ 1 + f(X u Yj),0} 
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for i,j > 1. As we will show (cf. Remark 3.6 below), the maximum A4 n can 
be computed as 

(4) M n = m&xT id . 

This fact is closely related to the idea in the Smith- Waterman algorithm for 
computing the (gapped) maximal local alignment score efficiently; see [21]. 

Definition 2.1. For < i,j < n — 1 define 

A(i,j) = m£{5 > 0\S[j < 0, or % + 5 = n, or j + 5 = n}. 

If Tij = 0, the alignment (i,j,A(i,j)) is called an excursion, and we let £ n 
denote the set of all excursions. 

Note that £ n is a stochastic subset of 7i n . It follows from the definition of 
the score matrix (Tjj) and the definition of an excursion that if A) € £ n 
and < 5 < A, then 



An excursion corresponds to a diagonal strip in the score matrix, for which 
the score starts at zero and then stays strictly positive along that diagonal 
strip until it either reaches zero or the indices hit the boundary of the score 
matrix. 

The maximum over an excursion e = A) E £ n is denoted by 



Definition 2.2. The number of essentially different excesses over t is 
defined as 



From (4) it follows that (C n (t) = 0) = {M n < t). 

3. Alignment of independent Markov chains. Assume that the stochas- 
tic processes {X n ) n >\ and (Y n ) n >i are independent Markov chains with 
transition probabilities P and Q, respectively. Assume that P and Q are 
irreducible and aperiodic matrices with left invariant probability vectors np 
and ttq, respectively. Let tt = irp ® ttq. With 




(5) 



M e = max T i+s ,j+s- 



(6) 



C n (t)=Y, l(M e >t). 



A* = 7f(/)= f{ x iV)'Kp(x)'KQ{y) 



x,y£E 



the (invariant) mean of f(X±, Y\) we will assume throughout that fj, < 0. 
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In the following, a cycle w.r.t. a matrix of transition probabilities P is a 
finite sequence x±,...,x n such that 

P(%i, ^j+l(modn)) > 

for i = 1, ... ,n. We will assume that the following regularity conditions on 
/, P and Q are fulfilled: For some n > 1 there exist cycles xi, . . . ,x n (w.r.t. 
P) and yi,...,y n (w.r.t. Q) such that 

n 

(7) £/(£ fc ,y fc )>o. 

k=l 

For any T > 1 there exist an n > 1 and cycles xi,...,x n (w.r.t. P) and 
yi,...,y n (w.r.t. Q) such that 

n n 

(8) f(xk,Vk) ^J2f{Xk,yk+T(modn))- 
k=l k=l 

See Remark 3.3 below for comments related to this somewhat strange looking 
condition. 

For convenience we will assume that both Markov chains are stationary, 
though the results obtained hold anyway. We denote by P the probability 
measure P^ under which (X n ,Y n ) n >i is a stationary Markov chain with 
transition probabilities P®Q. It will in addition be convenient to assume 
that there exist auxiliary random variables Xq and Yq such that (X n ,Y n ) n >o 
under P forms a stationary Markov chain too. As usual W x ,y will denote the 
probability measure where X\ = x and Y\ = y. 

We define for fl£KanE 2 x E 2 matrix $(#) with positive entries by 

and we let <p(0) denote the spectral radius (the Perron-Frobenius eigenvalue) 
of this matrix. Then (p is a convex C°°-function in 6, and due to (7), <p(0) — > 
oo for 9 — > oo. The fact that (p is (log)convex is due to Kingman [12], and the 
implicit function theorem can be used to show that ip is C°°. Furthermore, 
by Corollary XI.2.9(a) in [3] it holds that 

(9) d e <p(0)=ii, 

hence if fx < there exists a (by convexity unique) solution 6* > to the 
equation <p(0) = 1. If r* denotes the (up to scaling unique) right eigenvector 
corresponding to the eigenvalue 1 for $(#*), the matrix defined by 

v* (x' , y ) 

R (x,y),(x>,y>) = r *(^ y \ ®( e *)(x,v),(x',y') 

is an irreducible stochastic matrix with a unique left invariant probability 
vector, which we will denote by tt* . 
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With g:E 2 x E 2 any given function we introduce two E 3 x E 3 ma- 
trices, $1(5) and $2(5), by 

®l(g)(x,y,z),(x',y',z') = exp(g(x, y, x' , y') + g(x, z, x' , z'))P x ^,Q y ^ Q z>z >, 

$2(g)( x , w , y ),(x',w',y>) = exp(c/(x, y, x', y') + g(w, y, w' ' ,y'))P x , x >P w , w >Q y ,y> , 

and we let <pi(g) and ¥2(9) denote the corresponding spectral radii. In terms 
of the functions (pi and y?2 we define 

Ji = sup{2vr(5) - log ^(5)} 
9 

for i = 1, 2. Here 7r = 7r* <8> i?* denotes the measure on E 2 x E 12 with point 
probabilities tt(x, y, x' , y') = n*(x,y)R* x y ^ ^ x , y ,y We discuss J\ and J 2 in 
further detail in Remark 3.8. 

Finally, if we define the process (S n ) n >o by So = and for n > 1, 

n 

(10) S n = Y,f(Xk,Y k ), 

k=l 

we can define a constant, if*, in terms of this process, as done, for example, 
by (1.26) in Theorem B in [11]. We discuss this constant in further detail in 
Remark 3.7. 

Theorem 3.1. Assume that [i<0, that the regularity conditions given 
by (7) and (8) are fulfilled, and that 9* and K* are the constants defined 
above. Define for x E R 

_ log A"* + logn 2 + a; 

(11) *i— 0i 

and x n £ [0,0*) fry x n = #*(t n — |tn_|)- 27ien i/ 

(12) 2min{J 1 ,J 2 }>30* 7 r*(/), 
it holds that 

(13) \\V{C n {t n )) - Poi(exp(-x + x n ))\\ -» 

/or n — ► 00. || • || denotes the total variation norm and T>(C n (t n )) is the 
distribution ofC n (t n ). In particular 

(14) F(M n < t n ) - exp(- exp(-x + x n )) 
for n — > 00 . 



The theorem deserves a number of remarks. 
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Remark 3.2. The choice of x n = 9*(t n — \t n \) assures that t n — x n /6* = 
[t n \ 6 Z. Due to the lattice effect arising from / taking values in Z it follows 
that 

(C n (i„) = m) = (C n (t n - x n /6*) = m) 

as well as 

(M n <t n ) = (M n <t n -X n /e*), 

and this is the reason that we need to correct by x n in the asymptotic 
formulas. If / is a real, nonlattice function, Theorem 3.1 holds without the 
x n -correction. 

Remark 3.3. The regularity condition (8) does not look particularly 
nice in general but is usually satisfied by quite trivial arguments. Essentially 
we want to avoid the situation where 

(15) f(x,y) = f 1 (x) + f 2 (y) 

for two functions /i, fa : E — > R. It is clear that if / is of the form (15), then 
(8) does not hold. It is easy to verify that if P and Q have only strictly 
positive entries, condition (8) is equivalent to / not being of the form (15). 
In general, however, this author has not been able to prove that / not 
being of the form (15) is sufficient for (8) to hold. On the other hand, no 
counterexamples have been found either. In the proof we will explicitly need 
that (8) holds. 

Remark 3.4. It is possible, and of practical relevance, to allow for the 
aligned sequences to have different lengths m and n, say. In this case Theo- 
rem 3.1 holds for n,m — > oo with 

log K* + log(mn) + x 

Some restriction on the simultaneous growth of m and n must be made in 
order for this to be true. In the proof of Lemma 5.15 we will need to be able 
to choose integers l n ,m fulfilling that 

lim l ° g } nm) = lim l T ,=0, 

where n,m — > oo refers to the desired simultaneous growth of n and m. 
Clearly this can be achieved if m ~ cn for some constant c > 0, whereas, for 
example, m ~ log n does not work. 
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Remark 3.5. For notational convenience Theorem 3.1 was stated and 
proved using a score function / that depends on a single pair of variables 
only. When aligning Markov chains it would be perfectly natural to use 
a score function that depends on pair-transitions instead, that is, / : E 2 x 
E 2 -» M and 

5 

Sij = ^2 f(Xi + } e -i,Yj + } e -i, X i+ k,Yj + k). 

k=l 

Theorem 3.1 holds for this kind of score function with the obvious modifi- 
cations. For instance, <!> is defined as 

${8)(x,y),(x>,y>) = exp(6f(x, y, x', y'))P XjX >Q yi y> , 

and 7r* in (12) is replaced by it. In practice / can be chosen as a (conditional) 
log-likelihood ratio. If the alternative to the null hypothesis is assumed to 
be a Markov chain on E 2 governed by an E 2 x E 2 matrix of transition 
probabilities R, then we could choose 

f(x,y,x ,y ') =log A \i ' y ' . 

This score function does clearly not take integer values in general, but one 
may choose to consider [Nf\ for suitably large N if integer scores are pre- 
ferred. 

We find that for this score function / and for 9 = 1 

^(^(xaUx'M') = exp(/(x, y, x', y'))P x:x <Q y ,y< 

which has row sums equal to 1. Hence y(l) = 1 implying that 9* = 1. 

Remark 3.6. The process (5" n ) n >o defined by (10) is called a Markov 
controlled random walk or a Markov additive process (abbreviated MAP); 
see [3], Chapter XI. The process (T n ) n >o defined by 

(16) T n = S n - min S k 

0<k<n 

is called the reflection of the MAP at the zero barrier. It is straightforward 
to verify that (T n ) n >o satisfies the recursion 

T n = max{T n _i + f(X n , Y n ),0} 

for n > 1. In addition 

max S m — Sk= max < S m — min Sk 

Kk<m<n l<m<n I Kk<m 

(17) 

= max T m . 

l<m<n 
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We see that Sq = S n and T n n = T n . Thus along the main diagonal in 
the score matrix (2i,j)o<ij<n we find the reflection of the MAP (5 n ) n >o- 
Along all other diagonals in the score matrix we find the reflections of MAPs 
too — these MAPs being defined by shifting the Markov chain (X n ) n >i along 
(Xn)n>i - It follows from (17) that (4) indeed holds. Due to independence and 
stationarity of the two Markov chains all the reflected MAPs along diagonals 
have the same distribution, but they are dependent. The interpretation of 
Theorem 3.1 is that asymptotically the number of excursions exceeding level 
t n has the same distribution as if the reflected MAPs were independent. 



Remark 3.7. The constant K* is defined in terms of the MAP (S n )n>o- 
Let r_(0) = and for k > 1 let 

r-(k) = inf{n >r^(k- l)\S n < S T _ (fc _ 1} } 

denote the times when the MAP descends below its previous minimum. 
These stopping times are known as the descending ladder epochs for the 
MAP, and they are almost surely finite due to assumption that \i < 0. One 
should note that T-{k) is also the kih. time that the reflected MAP (T n ) n >o 
hits 0. A thorough treatment of the ladder epochs is given in [1] covering 
also general state-space Markov chains. From Theorem l(i) in [1] it follows 
that the sampled Markov chain (A" T _( n ), Y T _( n )) n >o has a unique invariant 
probability distribution, which we will denote by v. As we consider only 
a finite state-space Markov chain, this is also a direct consequence of the 
Wiener-Hopf factorization ([3], Theorem XI. 2. 12). Moreover, the sequence 
defined by 

ux, y (n) = P(T n = 0, X n = x, Y n = y) 

= P(3 k : r_ (k) = n,X n = x, Y n = y) 

for x,y € E and n > 1 forms a renewal sequence and the elementary renewal 
theorem, ([3], Theorem V.1.4) gives that 

(18) -}^u X) y(k)^ 

for n — ► oo where =E„(t_(1)). We refer to ([5] Theorem 10.4.3) for a 
proof that the inverse of the mean recurrence time indeed is given as the 
right-hand side limit above. 

As stated in Lemma B in [11], when \i < and (7) holds, then 



(19) lim ¥ x , y max S n > u\ exp(9*u) = e(x, y) 

yl<n<T— (1) 
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for some constants e(x, y) > 0, x, y £ E. In terms of these limits the constant 
K* can be represented as 

(20) K* = — Y t v{x,y)e{x,y). 

f 1 - x,y 

As a consequence of Walds identity for MAPs ([3], Corollary XI. 2. 6), it holds 
that 

E„(S T _ ( i)) 

H- = , 

i 1 

which shows that (20) is identical to the representation of K* in (1.26) in 
[11]. We refer to [11] for more details and in particular their Section 5 for 
issues related to the computation of K* . 

Remark 3.8. The function g i— > log (pi (g) is a convex function and Ji 
is thus the Fenchel-Legendre transform of the function evaluated in 2%. 
It is possible to identify Ji as the value of a large deviation rate- function. 
Considering J\ we introduce the function h:E 3 x E 3 — > R s xE by 

h(x,y,z,x',y',z') = (l( x ,y),(x', y ')( v ) + 1 (x,z),(x,z')(v)) v€E 2 xE 2- 

If (X n , Y n , Z n ) n >o is a Markov chain with transition probabilities P®Q®Q, 
the large deviation rate-function for the empirical average 

1 n 

evaluated in 2a for a a probability measure on E 2 x E 2 is given as 

/(2a) = sup{2a(si) - log ^1(5)}; 
9 

see Theorem 3.1.2 in [7]. In particular J\ = 1(2%). 

Let v be a probability measure on the space E 3 x E 3 and define 

x',y',z' 

v (x ,y ,z)= u(x,y,z,x ,y ,z); 
x,y,z 

thus u 1 and v 2 are marginal probability measures on E 3 . The measure v 
is called shift-invariant if v 1 = is 2 , and we denote by A4 the set of shift- 
invariant probability measures on E 3 x E 3 . Considering the Markov chain 
on E 3 with transition probability matrix Q <8> Q, then the large deviation 
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rate-function for the pair-empirical measure (cf. Theorem VI. 3 in [8]) is given 
as 

1 \ v )= 1^ v(x,y,z,x ,y ,z)log— — — — 

X' ,y f ,z f 

= H(v\v 1 ® P®Q®Q) 

for v € A4. Here H(-\-) denotes the relative entropy. Defining additional 
marginals 

1/12 (re, V, x', y') = u ( x > y> z ' x '> z ') ' 

z,z' 

ui 3 (x, z, x', z') = Y v{x, y, z, x', y', z'), 
y,y' 

with v\2 and 1/13 being probability measures on E 2 x E 2 , it is a consequence 
of the contraction principle, Theorem III. 20 in [8], that 

Ji = \ni{I 2 (u)\u £M:vi2 + u n = 2tt}. 

A similar representation of J2 is of course possible. 

If (X n ) n >i and (Y n ) n >i are independent sequences of i.i.d. variables with 
the X's having distribution 7Ti and the y's having distribution tt2, then 

is the Laplace transform of the distribution of f(Xi,Y\), and 8* > solves 
(fi(0) = 1. Moreover, 7r* is the probability measure on E x E with point 
probabilities ir*(x,y) = exp(6* f(x, y))TTi(x)n2(y) ■ In this case we can verify 
that the infimum above is attained for 

/ / / a 7r*(ac,2/)7r*(a;,2;)7r*(a;',2/)7r*(a; / ) z / ) 
v(x,y,z,x ,y ,z ) = . , 

with 7T* denoting the first marginal of ir* . To see this first note that v 
is clearly shift-invariant with the desired marginal property, vvi + ^13 = 
2-7T* ® vr* . A simple computation reveals that 

i 2 (v) = 2e*K*(f) log = 20*7r*(/) - #KM, 

and for any other shift-invariant v with the same marginal property one 
finds that 

I 2 {v) = H{v\v) + I 2 {v), 

where H{y\v) > 0, hence 

J 1 = 2r 7 r*(/)-F(7r 1 |7ri). 
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Similarly we can show that J 2 = 26>*7r*(/) - F(7r 2 |7r 2 ). Since 6*-K*(f) = 
H(ir*\iri <g> 7T2), the condition given by (12) is equivalent to 

(21) F(7r*|7ri®7r 2 ) >2max{iJ(7rJ|7ri),iJ(7r2|7r 2 )}, 

which is precisely the condition (E') required in [6] in the i.i.d. case for 
C n (tn) to be asymptotically Poisson distributed. Since condition (H) in [6] 
is equivalent to f/, < and (7) in the i.i.d. case, and since condition (E') 
actually implies that / does not take the form (15) in the i.i.d. setup, we 
conclude that (8) is also fulfilled in the i.i.d. case when assuming (E'); see 
Remark 3.3. Thus Theorem 3.1 specializes in the i.i.d. case to Theorem 1 in 
[6] with the same conditions. 

It is a small nuisance that (12) is not as explicit as condition (21) in the 
i.i.d. case, as (12) is given in terms of the values of J\ and J 2 , which in 
turn are the results of an optimization. We showed above how to solve this 
optimization problem explicitly in the i.i.d. case, but it does not seem that 
there exists such a simple solution for general Markov chains. From a practi- 
cal point of view one may notice that taking g* (x,y,x' ,y') = 39* f(x' , y')/4, 
then 

(22) max{^ 1 ( 9 *),(^ 2 (< 7 *)}<l 

implies (12). Since tpi(g*) and (p2(g*) can be computed numerically we see 
that (22) provides a usable, sufficient criterion for Theorem 3.1 to hold. 

4. The counting construction. We will show that C n (t n ) is asymptoti- 
cally Poisson distributed by constructing another counting variable, which 
equals C n (t n ) with probability tending to 1, and for which we can verify the 
conditions given in Theorem 1 in [2]. 

We need to introduce some notation. Let 

I = {{i,j)\0<i,j<n-1}; 

then for each a = £ I and 5 > we define the (pair) empirical measure 

£ a ,5 by 

1 6 

£a,s((x,y), (%',y')) = 7 zJ 1 (x,y),(x',y')({^i+k-i,yj+k-i), {X i+k ,Y j+k )) 
k=l 

for (x,y), (x',y f ) G E 2 . With abuse of notation we will in the following also 
use / to denote the function defined on E 2 x E 2 by (x,y,x',y') h- > f(x',y'). 
Then 

5 

& M (/) = £/(X i+fc ,^-+*) = S&. 

k=l 
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Let d denote the total variation metric on the set of probability measures 
on E 2 x E 2 . Then for a £ / and for any t > 0, 77 > and integer I > define 
the variable 



V a = V a (t,l,r)) = l (T a = 0, max 6e a>s {f)>t . 

and d(£ a g,n)<ri 

We should observe that the counting variable C n (t) has the following repre- 
sentation: 

(23) C n (t) = Vl(T a = 0, max <fe , 5 (/)>i). 

We show in Section 5.8 that in the setup of the present paper, for a suitable 
choice of l n and 77 > 0, then 

(24) nY, y a{tnX,V)^C n (t n )\^Q 

\ael / 

when n — > 00. The reason for introducing the /-restriction is to be able to 
control the dependencies between the 14-variables better. The reason for the 
restriction on the empirical measures is more subtle, and we give a discussion 
of this in Section 6. 

As mentioned, we prove that J2aei Va is asymptotically Poisson distributed 
by applying Theorem 1 in [2], which is based on the Chen-Stein method. 

We assume that a subset B a Q I is given for all a £ /. This set B a is called 
the neighborhood of strong dependence of V a , and in the proof of Lemma 
5.16 we make a concrete choice of B a . Furthermore, for a € / let 

Ta=<j{V h \b£B a ) 

denote the cr-algebra generated by those variables VJ, not in the neighbor- 
hood of strong dependence of V a . 
Rephrasing Theorem 1 in [2] gives: 

Theorem 4.1. Suppose that (l n )n>l, (t n ) n >i and n > are chosen such 
that for some sequence (\ n )n>i 



(25) (3 



1,71 



for n — ► oo ; and suppose that 

(26) /V= E E(K,)E(V6)->0, 

(27) /%,„= E E(TO)->0, 

(28) /V = E E l E (^!-^)- E (^)H°> 
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for n — > oo; then 
(29) 

In fact, the bound 



P (EM -Poi(A r 



0. 



p (Em - poi (An: 



< /?i, n + 2(/? 2 ,n + #3,n + /?4,n) 



always holds. 



As a direct consequence, using the coupling inequality, we have the fol- 
lowing corollary. 

Corollary 4.2. If (29) holds and (24) is fulfilled also, then 

(30) ||P(C n (t n ))-Poi(A„)|H0 
and 

(31) P(M n < t n ) ~ exp(-A n ) 0. 



5. Proofs. The proof of Theorem 3.1 is divided into a number of lemmas. 
We need to verify the conditions in Theorem 4.1, and to this end we need 
bounds on the expectations EiVaVb) = P(V a = 1, V& = 1) for b £ B a and a^b. 
This is the subject of the following subsections and the most difficult part 
of the proof. In Section 5.8 we collect the bounds obtained to prove that the 
conditions of Theorem 4.1 are fulfilled when aligning independent Markov 
chains under the assumptions given in Theorem 3.1 and we show that (24) 
holds. 

For a,b £ I we always have that 

E(V a V b )<¥[ max Je M (/)>t, max fe M (/)>t) 

y 5:<5<A(a)AZ 5:5<A(b)A« J 

and d(e a s,ir)<r] and d(e(, i,7r)<»y 

(32) 

<Z 2 max pAi^a^M^^KnA 
i<<5i,<52</ v o 2 e^ 2 (/) >t,d{e bt s 2 ,Tr) < V J 

To bound E(V^Vfe) we thus need to bound the probability on the right-hand 
side above. The same X- and Y-variables may enter both of the empirical 
measures in two essentially different ways. Either variables from both se- 
quences enter both empirical measures or only variables from one sequence 
enter both empirical measures. These two different cases need different treat- 
ment. To give an exhaustive treatment of the different ways that such a 
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sharing of variables can be arranged becomes unreasonably complicated, so 
we choose to treat the two essentially different cases for a specific arrange- 
ment of the sharing of variables in sufficient detail for the reader to be able 
to convince himself that all other arrangements can be treated similarly. 

5.1. Positive Junctionals of a Markov chain. We make a useful and gen- 
eral observation on how to bound the expectation of positive functionals of 
a Markov chain. It allows us to assume parts of the same Markov chain to 
be independent, stationary versions at the expense of a constant factor. 

Lemma 5.1. Let Z = (Z n ) n >o be an irreducible Markov chain on a finite 
state space F and let = k\ < ■ ■ ■ < k^ < oo be given. Then there exists 
a constant p^ such that if (^)^i. for i = 1, ...,N (k^+i = oo) are N 
independent stationary Markov chains with the same transition probabilities 
as Z, and Z = (Z n ) n >o is given by Z n = Z l n if ki < n < fcj+i, then for a 
positive functional 

A:F N °^[0,oc) 

it holds that 

(33) E(A(Z))<p N E(A(Z)). 

The constant pn does not depend on the actual initial distribution of Z nor 
on the functional A. 



Proof. Assume N = 2. The general result follows by induction. Assume 
first that Z is stationary and that (Z^^q and (Z^) n >& 2 are independent 
and stationary. Then Z has the same distribution as Z conditionally on 
Z\ = Z| 2 ; hence using that A is a positive functional 



E(A(Z)) 



E(A(Z);Zl 2 = Zl) 

nzi 2 =zi 2 ) 

<pE(A(Z)) 



with p = (^x^e^x) 1 ) where 7r is the invariant distribution. 

If Z is nonstationary with initial distribution v, say, we have that 



E U (A(Z)) = Y J —^xE x {A{Z)) 



< — E^(A(Z)). 

mm x tt x 

So pi = pj mm x Tr x will do. In general p^ = p N ~ l / mm. x -k x can be used. □ 
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5.2. Exponential change of measure. Let Z = (Z n ) n >o be a Markov chain 
on a finite state space F with transition probabilities R. Assume that R is 
irreducible, and assume that j:F x F^IR is a given function. Then we 
define the matrix *&(g) with positive entries by 

with spectral radius ijj(g) and corresponding right eigenvector r 9 = {r g {x)) x ^F- 
Due to irreducibility of ^f(g) this eigenvector has strictly positive entries. 
With 

n 



k=l 



we define the process (L^) n >o by 

jg = r 9 {Z n )e^p(g n (Z)) 
ra{ZMgY ' 

Then with (J- n ) n >o the filtration of cr-algebras generated by the Markov 
chain it follows that 

E^I^O = e ^^^- E(r 9 (Z n )eMg(Zn-i,Z n ))\Z n ^) 
f*A\ exp( gn _i(Z)) 

(34) = r9 (zM 9 r 

exp( 9n _i(Z)) . r9 

= 1>( 9 )r°(Z n - 1 )=L° n _ 1 . 

This shows that (L^, J- n ) n >o is a martingale, for which > and E(L^) = 
E(Lq) = 1. A probability measure P^ on J- n is then defined to have Radon- 
Nikodym derivative LP n w.r.t. the restriction of P to T n - These measures can 
be extended to a single measure P 9 , the exponentially changed or exponen- 
tially tilted measure, under which (Z„) n >o is a Markov chain ([3], Theorem 
XIII. 8.1) with transition probabilities 

We should observe that since the eigenvector fraction is bounded below 
by a strictly positive constant, and bounded above as well, then E(L$Q = 1 
implies that 

(35) ilogE(exp( ffri (£)))^logV(2) 

n 

for n — > oo. 
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If we return to the setup of the present paper with F = E 2 , g = Of for 
flsl, and the Markov chain being Z = (X n ,Y n ) n >o, then 

n 

g n (Z) = J20f(X k ,Y k )=eS n , 

k=l 

and we find that the matrix R* introduced in Section 3 is precisely the ma- 
trix of transition probabilities for the Markov chain under the exponentially 
changed measure F e *f. We will denote this measure simply by P*. Note that 
the exponential change of measure does not change the distribution tt of 
(Xq,Yq) whereas the invariant measure tt* for R* typically differs from tt. 
The measure under which (X n ,Y n ) n >o is a stationary Markov chain with 
transition probabilities R* will be denoted P* * . We use E* to denote expec- 
tations under P*. 

If we define the stopping time r = inf{n > 0\S n > t}, then an easy con- 
sequence of the exponential change of measure technique is, according to 
[3], Theorem XIII. 3. 2, the following Lundberg-type inequality: For any event 
GeT T with G C (r < oo) 

(36) P(G) = E* (-1; G) < Kexp(-9*t). 

The inequality follows from L 9 T > K exp(6*t) where K bounds that eigen- 
vector fraction. 

5.3. Variables shared in one sequence. Let g:E 2 x E 2 ^>R be a func- 
tion and let rf = (rf(x,y,z)) denote the right eigenvector for $>i(g) with 
eigenvalue <fi(g) for i = 1,2, respectively. As above, due to irreducibility, all 
coordinates of these vectors are strictly positive. 

In this section we derive a result corresponding to variables shared from 
the X-sequence only, and we thus use the $i matrix. Similar derivations for 
variables shared from the K-sequence only using $2 are possible. 

Fix i <5\ and 82 > 5\ — i and define the functions 

i 

o-i((x k )k,{yk)k) = ^2 g{xk-i,yk-i,x k ,yk), 

k=l 

61 

02((xfc)fc, (Uk)k, (zk)k) = 9(xk-i,yk-i,x k ,y k ) + g(x k _i,z k -i,x k ,z k ), 

k=i+l 
i+82 

0-3((xk)k, (Zk)k) = X! 9( x k-l,Zk-l,X k ,Z k ). 
k=S 1 +l 

Let $0(5) denote the matrix 

®o(g)(x,y),(x>,y') = exp(^(x, y, x', y 

))Px,x'Qy,y' 
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and <fo(g) the spectral radius. Let Tq denote the corresponding right eigen- 
vector. We define a positive functional C 9 on (E 3 )^ by 

C g = r%(xi,yi)exp(ai) rf(x Sl ,y Sl ,z Sl )exp(a2) r%(x i+ s 2 , z i+ s 2 ) exp(q 3 ) 
ro(x ,yo)fo(gy ii(x i ,y i ,Zi)<p 1 (g) s i- i r 9 (x Sl ,z Sl )ip Q (g) i +^-^' 

Assume that (Z n ) n >i is a stationary Markov chain with transition prob- 
abilities Q independent of (X n ,Y n ) n >i, and let X = (X n ) n >i, y= (Y n ) n >i 
and Z = (Z n ) n >i. Introduce also y T = (Yr+n)n>i as the T-shift of 3^ for 
T> 1. 

Lemma 5.2. It holds that 

(37) E(C 9 (X,y,Z)) = l, 

and, furthermore, there exists a constant p > such that 

(38) K(c g (x,y,y T )) < p 

whenever i + T > <5i + 1. 

Proof. The first part of the lemma follows by repeating the arguments 
in (34) three times corresponding to making three different, successive ex- 
ponential changes of measures. The second claim follows by Lemma 5.1. 
□ 

We restrict our attention to the case where i + T > 8\ + 1 , so that there 
is no overlap in the ^-sequence. Let £\ = £(o,o),<5i an d £2 = £(i,i+T),8 2 - 

Lemma 5.3. For any g:E x E — ► R and e > there exist constants 
rj, K > such that for all s > 0, 

A 5 T ^^V") <Ke,J-s( 2 -^^ -e)\ 
\6 2 e 2 (f) >s,d(e 2 ,Tr) <r 1 J ^\ \ n*(f) J J 

Proof. First we show that log ipi (g) > 2\og(po(g)- Let 

n 

g n (x, y) = J2 g(Xk-i,Y k -i,x k , Y k ) 

k=l 

and define g n (X,Z) likewise. Let E^, Ey and ~Kz denote the expectation 
operators where we only integrate w.r.t. the distribution of X, y or Z, 
respectively. Introduce 

p n (X)=E y (exp(g n (X,y))); 
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then by (35) and Tonelli 

log<p (g) = lim -logE x (p n (X)). 

n — >oo fi 

Using Tonelli again and Jensen's inequality, and that y and Z are indepen- 
dent and identically distributed, we find that 

E(exp( 5n (^,^) + 9n{X,Z))) =E x (E y (eM9n(.X,y)mz(.eM9n(.X,Z)))) 

= E x (p n (X) 2 )>E x (p n (X)f. 

Using (35) again gives 

logpi(</)= lim - l ogE(e^p{g n (X,y) + g n (X,Z))) 

n — >oc fi 

>2 lim -logE x (p n (X)) = 2log Vo (g)- 

n — >oo fi 

Since 2(S X — i) + i + (i + 5 2 — #i) = S x + S 2 and a\ + a 2 + cr 3 = <5i£i(g) + 
^2^2 (g), the inequality log tpi(g) > 2 log v?o(g) gives that 

^O^,^) >7exp(5 iei ( 5 ) +S 2 e 2 (g) - (S 1 + <J 2 ) log 

with 7 > a lower bound on the eigenvector fractions. We may assume that 
2ir(g) — log (pi(g) > 7r*(/)e since the result is trivial otherwise. Then we can 
find e' > such that 

2(7r(g) -e') -logy>i(g) = 2?r(ff) - logy?i(ff) _ 

7T* (/)+£' 7T*(/) 6 

and choose 77 so small that for z/ a probability measure on E 2 x £ 2 with 
d(v,7r) <r] we have |^(<?) — tt (c?) | < e' and |^(/) — tt*(/)| < e' . On the event 

A _ f5iei(f)> s,d(ei,n) <t]\ 
\he 2 (f) > s,d(e 2 ,ir) <f]J 

we see that 

Sietig) + 5 2 e 2 (g) - (5 1 + 5 2 )\og Vl (g)/2 > *l±**(2(it(g) - e') - log ^(g)) 



> s 



f2fc(g) -log ^1(3) 



V 7T*(f) 

since on A we have 5± + 5 2 > 2s/{-K*(f) + e'). Hence 
F(A)-E( C9{X > y > yT) -A) 

nA) - E {c 9 (x,y,yry ) 

< 7" 1 ex P (- ( m^EM _ e) ) E(£^ 5 * 3^); A) 
<P7-exp(- S (^_^iM_ 
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where the first inequality follows by bounding the denominator from below 
using the inequalities obtained above, and the second inequality follows from 
Lemma 5.2. □ 



If the condition (12) is fulfilled, then 2J\ > W*n*{f) and we can in par- 
ticular choose a function g and an e > sufficiently small such that 

27%) - log pi («/) > (30*/2 + 2e)n*(f). 

The following corollary is therefore a direct consequence of Lemma 5.3. 

Corollary 5.4. // (12) is fulfilled, there exist constants e,r],K>0 
such that for all s > 

(39) Pff i{ fl >S ^% <V ) <KeM-WV2 + e)s). 

V<W/) >s,d(e 2 ,ir) <rjj 

The result in Corollary 5.4 gives a prototypical inequality under the as- 
sumption 2J\ > 39*ir*(f) when only variables from the AT-sequence enter 
both of the empirical measures. If only variables from the Y-sequence enter 
both empirical measures, a similar inequality is obtained under the assump- 
tion that 2J 2 > 30*tt*(/). 

5.4. A uniform large deviation result. To handle the case with variables 
shared from both sequences we need a special large deviation result for 
Markov chains that we will derive in this section. We first state the useful 
Azuma-Hoeffding inequality for martingales with bounded increments; see 
Lemma 1.5 in [13] or Theorem 1.3.1 in [19]. 

Lemma 5.5. // (Z n ,J- n ) n >o is a mean-zero martingale with Zq = such 
that for all n > 1 

\Z n ~ Z n ~\ | < c n 
for some sequence (c n ) n >i, then for A > 

A 2 



{Z n > A) < exp 



2 J2k=i c k 



Fix j > 1 and let in this section (X n ,Y n y n=1 be a stationary, aperiodic 
and irreducible Markov chain with transition probabilities given by R and 
invariant distribution itr. Let (Y n ) n >j + i be an independent, stationary, ape- 
riodic and irreducible Markov chain with transition probabilities given by Q 
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and invariant distribution ttq. For an E 2 x E 2 matrix G define the norm of 
the matrix as 

II^Hoo = max \G(x,y),(z,w)\- 

With 1 the column vector of l's, the matrix R k converges to Ittr due to ir- 
reducibility and aperiodicity, and since the rate of convergence is sufficiently 
fast, in fact geometric, we have that 

oo 

^||i? fe -lvr R || 00 <oo. 
fc=0 

For an E 2 vector / we let ||/||oo = m ax( X]2/ ) \ f(x,y)\ denote the max-norm. 
Then clearly for any E 2 x E 2 matrix G, with G(f) the matrix product of G 
with the vector /, ||67(/)|| 0O < H/HoollGKoo, and especially 

\\R k (f) - l7Ttf(/)||oo < \\f\\oo\\R k ~ iTTfllU. 

For T > 1 a fixed constant we want to give an exponential bound of the 
probability 

(40) p(^/(x fc ,y fc+T )>^/(x fc ,y fc )) 

\fc=i fc=i / 

if E(/(Xfe,Yfc + 7')) < E(/(Xfc, Yk)) all k. This is achieved by introducing a 
relevant martingale and then using the Azuma-HoefFding inequality. 

Let J-q = {0,f2} and for n> 1 let T n denote the cr-algebra generated by 
Xi,..., X n , Y"i, . . . ,Y n together with Yj + ±, . . . , Y n+ T if n + T > j. Define 

Sj,T = E [f(Xk,Y k+T ) - f(X k ,Y k )} (S ,t = 0), 

k=l 

and with £j t = E(/Sj i 7 i ) let 

(41) Z n = E(S iiT -^T|-Fn). 

Then (Z n , J r n )^ t=0 is a mean-zero martingale with Zq = (depending on T, 
though we have suppressed this in the notation). Notice that Zj = Sj t T~£,j,T- 
The following lemma shows that the martingale differences 

\Z n — Z n -i\ = \^(Sj t T\J r n) — E(S , j i r|jr n „ 1 )| 
are uniformly bounded by a constant. 

Lemma 5.6. There exists a constant r/ independent of j and T such that 

(42) \Z n -Z n - X \<ji. 
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Here 77 can be chosen as 

00 

(43) r? = 611/1100^11^-17^1100. 

k=0 

Proof. The Markov property gives that for n<k<j 
E(f(X k ,Y k )\T n ) = R k ~ n (f)(X n ,Y n ). 
Define the function / by 

f(x,y) = R T (f(x r ))(x,y) = Y,f(x^)Rj x , yUz , w) , 

z,w 

and for n<k<j define f k ^ n 

J2f(x,z)Qy7z n , ifn + T>i, 

2 

J2f( x , z )^Q( z ), \in + T<j. 



fk,n{ X ^) 

Then 



R k -' n {f)(X n ,Y n ), ked 
E(f{X k ,Y k+T )\T n ) = \ R^(f kjn (.,Y n+T ))(X n ,Y n ), k e C 2 

R k + T - n (f(X k ,-))(X n ,Y n ), keC 3 

where 

Ci = {k\n<k<k + T<j}, 
C 2 = {k\n<k<j <k + T}, 
C 3 = {k\n -T <k <n<k + T <j}. 

Observing that 

3 3 
®(S jtT \Fn) = ^(f(X k ,Y k+T )\^ n ) -J2W(X k ,Y k )\F n ) 

k=l k=l 

and subtracting E(Sj j T\^ r n-i) from this, the martingale difference Z n 
is seen to be the sum of the following two terms: 

.1 



h= [W(Xk,Y k+T )\F n )-E(f(X k ,Y k+T )\T n ^)], 

k=n-T 
3 

t 2 = J2[W(X k ,Y k )\F n _ 1 )-E(f(X k ,Y k )\F n )}. 



k=n 
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Since 

\E(f(X k ,Y k )\T n ) - n R (f)\ = \R k - n (f)(X n ,Y n ) - 7t R (f)\ 

^ll/llooll^-l^lloo, 

the term t 2 is controlled by the following inequality: 

j oo 

(44) \t 2 \ <2\\f\\ OG J2 W Rk " n ~ l7r «IU < 2||/||oc E II** " l7r «IU" 

k=n k=0 

Noting that ||/||oo, \\fk,n(-, y) lloo, \\f{x, •) ||oo < ||/||oo we observe that for 
k£d, 

-tt k (/)| < ||/||oo||^-™ - Ivr^lloo, 

for k G C 2 , 

|E(/(X fc ,y fe+r )|^ n ) -7r fl (/ fcin (-,y n+r ))| < ll/llooll^-lTTflHoo, 
and for k G C3, 

|E(/(X fc ,y +T )|^ n )-7r fi (/(X fc ,-))| < H/llooll^+^-lvr^lU. 

Since the three inequalities above also hold when conditioning on JF n _i we 
obtain 

E |E(/(x fc ,y fc+r )|^ n ) -E(/(x fc ,y fc+T )|^ n _i)| 

fcec*iuc 2 uc 3 

< 2II/IU £ H^" n " + 2||/||oo E ll^' +T " n " iTfllloo 

keduc 2 kec 3 
00 

<4||/|| 00 Ep fc -l^lloo. 
k=0 

Finally, if n — T<k<n<j<k + T, then 

E(f(X k ,Y k+T )\T n ) =E(f(X k ,Y k+T )\T n ^) = f(X k ,Y k+T ), 

hence 

00 

!tl|<4||/||ooEll^- l7r Rlloo, 
k=0 

which together with (44) gives (42) with 77 chosen as (43). □ 



Theorem 5.7. If^j^T<0, it holds that 

(45) F(Sj, T > 0) = F(Sj, T - Sj, T > -&,t) < exp 

TOi/j rj chosen as in Lemma 5.6. 



2jr] 2 
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Proof. This follows directly from the Azuma-Hoeffding inequality for 
the mean-zero martingale (Z n ,J- n y n=1 , since it has increments uniformly 
bounded by n. □ 

5.5. Mean value inequalities. We will apply the result in the previous sec- 
tion by considering the Markov chain {X n ,Y n ) n=l under the exponentially 
tilted measure P*« and (Yn)n>j+i under P^. To do so we will need to estab- 
lish inequalities relating the mean of f(X n ,Y n ) to the mean of f(X n ,Y n+ T) 
[or f(X n+T ,Y n )\. Let 

fj,* = E* 7r *(f(X 1 ,Y 1 ))=<K*(f) 

denote the stationary mean of f(X n , Y n ) under the exponentially tilted mea- 
sure and let 

^ = E^(/(X 1 ,F 1+T )) 

denote the stationary mean when shifting the Y-sequence T positions. 

It was mentioned in Section 3 that the function ip is log-convex. In the 
following we will need results obtained in [16] about strict log-convexity of 
(/7-like functions. 

Let F be a finite set, g : F — > R any function and R an irreducible F x F 
matrix of transition probabilities. Following Definition 2 in [16] we say that 
g is degenerate w.r.t. R if there exists a constant 7 such that for all cycles 
xi, . . . ,x n w.r.t. R 

n 
k=l 

Let ip{9) for 9 € R be the spectral radius of the F x F matrix ^(0) with 
entries 

From Theorem 5 in [16] it follows that if g is nondegenerate w.r.t. R, then 
ip is strictly log-convex, and if g is degenerate w.r.t. R, then ip(9) = exp(7#) 
(i.e., log^ is linear). The consequence that we will use repeatedly below 
is that if tp(0) = ip(9*) = 1 for 9* > and if g is degenerate w.r.t. R, then 
necessarily ip{6) = 1 for all 6 € R and the constant 7 equals 0. Thus if we 
can find a single cycle x±, . . . , x n w.r.t. R such that 

n 

(46) E 5(^)^0, 

fe=i 

then g cannot be degenerate w.r.t. R, and the function ip becomes strictly 
(log-)convex. Most importantly, we can conclude that dgip(0) < 0. 
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Lemma 5.8. With -k\ and ix\ denoting the marginals of tt* it holds that 
7r* ®7rg(/) < fi* as well as TTp^TT^if) < /i*. 

Proof. We consider (X n ,Y n ) n >i under the tilted measure P** and an 
independent stationary Markov chain (Z n ) n >i with transition probabilities 
Q. Then 

ni -^n)n>l 

is a Markov chain on E 3 , and we define the function / on E s by 

f(x,y,z) = f(x,z) - f(x,y). 
The Markov chain has transition probabilities given by 

R (x,y),(x',y')Qz,z' = exp(6*f(x', y'))P x , x >Qy,y> Q z , z >, 

with invariant distribution tt* <8>7tq. We also introduce the &(9) matrix 

^(G)(x,y,z),(x',y',z')=exp(e(f(x',z') - j d)))R\ Xt y)tf >y >)Q z,z> ■ 

With (p{9) the spectral radius of &{9) we have that </3(0) = (f>{0*) = 1 since 
$(0) is stochastic and has a right eigenvector with eigenvalue 1 having 

entries r*(x,z)/r*(x,y). Moreover, (8) provides the necessary cycle to show 
that (46) holds, and since 

d g <p(0) = tt*® 7Tq(/) = ttJ ® ttq(/) - vr*(/) = tt? ® vr Q (/) - M * 

by (9), it follows that 7r* ®7Tq(/) < /x*. The second inequality follows simi- 
larly. □ 

Lemma 5.9. TTie sequence {^)t>i is convergent, and 

■= lim A*r< A**- 

Proof. We first observe that 

A4 = E*.(/(Xl,y 1+r )) - 7T? ® 7T 2 *(/) 

for T — > 00, where 7rJ and 712 are the marginals of tt* . 

We consider (X n , Y n ) n >i under the tilted measure P** and let (W n , Z n ) n >i 
be an independent copy with the same distribution. Then 

(X n ,Y n , W n , Z n ) n >i 

is a Markov chain on E 4 with transition probabilities R* x y ^ y ^R* w z \ r w > z ') 
and invariant distribution tt* <g> tt*. We define the function f^ on E 4 by 

foo(x, y, w, z) = f{x, z) + f{w, y) - fix, y) - f(w, z). 
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Introduce the corresponding <3?oo(0) matrix by 

®oo(Q)(x,y,w,z),{x',y>,w>,z') = exp(6»/ 00 (x', y' , w', « / ))-R* a . j j,) j ( a .^j / /)-R( tU)8! ) j ( w / ) g/) 

and its spectral radius <fioo(0)- By arguments analogous to those in Lemma 
5.8 we conclude that ^oo(O) = (poo{8*) = 1, and that dg<foo(0) = — 2fi* < 
0. Hence fj,*^ < fx*. Regarding dgif oo(0) < 0, we can again use (8) to verify 
that (46) holds. □ 

It is interesting and very useful that the inequality in Lemma 5.9 holds 
not only in the limit but in fact for all T. 



Lemma 5.10. For all T > 1 it holds that 

(47) /4<//. 

PROOF. With = ££ =1 f(X k , Y k+T ) and S n = ££ =1 f(X k ,Y k ) we ob- 

serve that S n = S% under P = P w , since the X- and ^-sequences are inde- 
pendent, stationary Markov chains. By (35) this implies that 

(48) ilo g E(exp(0Sj))^lo g¥ >(0) 

n 

for n — > oo. 

Consider first the case T = 1 and the Markov chain 

(X n , X n+ i,Y n , Y n +i) n >i, 
which under the tilted measure has transition probabilities 

R (x,w,y,z),(x',w',y',z') = w ' n exp(9*f(w', z'))P W)W ,Q z , z i6~ w , x >8 z ,y' ■ 
I [W,Zj 

Introduce the matrix 

®l(Q)(x,w,y,z),(x>,w',y>,z>) = exp(0(/(z', z) - f(w', ^O))^^,^),^'^',/^') 

and its spectral radius (f>\{9). Clearly, <^i(0) = 1 and we observe that 

r*(w',z') ^ , , 

$l(8*){x,w,y,z),(x',w',y',z') = »/ ' n exp(0*/(x ,Z ))P W , W >Q z ,z'K,x'^z,y' ■ 

r [w, z) 

The matrix 3>i(0*) has the same spectrum if we remove the eigenvector 
fraction, hence (35) together with (48) imply that 

log£i(0*)= lim -logE(exp(0*Si))=log<p(0*) = O, 

n — ► oo 77, 



thus <pi(p*) = 1. 
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Furthermore, by (9) dg<pi(0) = fi* - fi*. Using (8) (for T = 1) together 
with (46) we find that d$<pi(0) < 0, hence 

* ^ * 

A similar argument for T > 2 is possible by introducing the Markov chain 

(X n , . . . , Y n , . . . , Y n+ T)n>l 

and a function /t given by 

/t(xo, • • • , sc t , yo, • • • ,2/t) = /(so, 2/t) - /(set, 2/t)- 

The spectral radius (pxiG) of the corresponding matrix $^(0) fulfills that 
<^t(0) = 0t(6*) = 1 and that dgipxiO) = //^ — < 0, using (8) to show that 
(46) holds. Thus fi* T < fi* . □ 

5.6. Variables shared in both sequences. We define for i,j,m,T > 1 with 

« < j 

Si = ^/(x fe ,y fc ), s 2 = f( x k,Yk), 

k=l k=i+l 
j i+m 

S 2 = J2 f(Xk,Y k+T ), S 3 = ]T f(X k ,Y k+T ). 

k=i+l k=j+l 

Lemma 5.11. There exist an e > and some K (both independent of 
T) such that 

(49) P(5i + S 2 > t, S 2 + S 3 > t) < Kexp(-6*(1 + e)t) 
for t > 0. 

Proof. Assume first that the number of variables j — i in the overlap- 
ping part is small, less than t(4||/|| 00 ) _1 , say, in which case we obtain the 
estimate 

P(5i + S 2 > t, S 2 + S 3 > t) < P(5i > 3i/4, 5 3 > 3i/4) 

<pP(Si >3t/4)P(S 3 >3t/4) 
<2fexp(-30*i/2), 

using Lemma 5.1 for the second inequality and then a standard exponential 
change of measure argument; see (36). This implies (49) with e = 1/2. 
If instead j -i >i(4||/|| oo) we observe that 

¥(Si + S 2 >t,S 2 + S 3 >t) 

(50) _ 

< P(5i + S 2 > t, S 2 > S 2 ) + ¥{S 2 + S 3 >t,S 2 > S 2 ). 
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With L* = r(Xj, Yj)/r(X , Y Q ) exp(0* (Si + 5 2 )) we obtain 

P^Si + S 2 > t, S 2 > S 2 ) =F n (jt-,S 1 + S 2 > t, S 2 > S 2 ^j 

<iexp{-9*t)Kj&>S2), 
where P* • denotes the tilted measure up to index j. Using Lemma 5.1, we 

can, at the expense of a factor p, assume that the sequence (X n ,Y n ) J n=i is 
a stationary Markov chain under the tilted measure and that (Y n ) n >j+x is 
independent and stationary under the original measure. Under this assump- 
tion it follows that the mean of S 2 — S 2 equals (j — T — i) fj,^ + Tit\ ® ttq (/) — 
(j — Using Lemmas 5.8, 5.9 and 5.10 we can find a £ > 0, independent 
of T, such that 

(j-T- i)fj,* T + Tirl ® 7tq(/) - (j - <-(j- i)C 

Hence Theorem 5.7 gives that 

P;,(5 2 >5 2 )<,exp(-^) < pe ^-^L^) 

or, with e = C 2 (^8||/|| 00 7 7 2 )- 1 , 

+ S 2 > t, S 2 > S 2 ) < p 7 exp(-r (1 +e)t). 

Of course, a similar argument takes care of the second term in (50) and (49) 
follows. □ 

5.7. Useful mixing inequalities. When the aligned sequences are i.i.d. the 
sets B a entering Theorem 4.1 are usually chosen such that V a and T a are 
independent, in which case E|E(U a |^ r a ) — E(U a )| = and the term fi^ n in 
Theorem 4.1 vanish. In the framework of Markov chains we need to control 
(3<i,n by using exponential /3-mixing of stationary, finite state-space Markov 
chains. To this end we need a few results on how to translate knowledge 
about the /3-mixing coefficients into useful bounds on E|E(U a |^ r a ) — E(U a )|. 

For two o"-algebras T and Q the a-mixing measure of dependence is 

a(F,Q)= sup \F(Ar\B) -F(A)F(B)\. 

The following lemma relates a-mixing measures to mean values of the desired 
form. 

Lemma 5.12. Let T and Q be a -algebras and let A£ Q. With rj = 1(A) 



(51) 
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Proof. With B = (E(r?| J") > E(r?)) G T and f = 1(B) we see that 
E|E(7?|^) - E(77)| = E(e(E(r?|^) - E(r/))) - E((l - £)(E(7?|.F) - Efa))) 
= 2(E(^)-E(e)E(r?)) 

= 2(P(An£)-P(,4)P( J B)) < 2a(J 7 ,^). □ 

The /3-mixing measure of dependence between the cr-algebras .F and Q is 
defined as 

/3(^,0) = e(su P |P(A|0)-P(A)|Y 

For a stationary stochastic process (Z n ) ne z and for a subset KZ we 
define the cr-algebra J 7 /- = a(Z n ;n € I). The /3-mixing coefficient is defined 
as 

(52) /3(n) = /3(^ oo) ,^ ( _ OOj0] ) = E( sup IP^^o]) - P(A)|) , 

for n > 1 and the process (Z n ) ne z is called /3-mixing if /3(n) — > for n — > oo. 
For two subsets /, JCZ, the distance, d(I,J), between the sets is defined 
as 

d(I,J)= inf |n — m|. 

If J, J C Z, we write I < J \i n <m for all n € / and m € J. 

Lemma 5.13. Assume that I\ < J < I2 are three subsets of Z. With 
I = I\ U I3 it /toWs t/iat 

a(Fi,Fj)<3/?(d(I,J)). 

This result is Theorem 3.1 in [20]. See also [9], Theorem 1.3.3 for a slightly 
more general result. 

5.8. Proof of the Poisson approximation. We recall the notation from 
Remarks 3.6 and 3.7 where 

r_(l) =inf{n>0|SW<0} 

and we let e$ = £(o,o),<5 such that S$ = 5e$(f). 

Lemma 5.14. There exist constants K, c > such that for all n > 1 

(53) P(t-(1) > n) < A'exp(-cra). 

Moreover, for any r] > £/iere exist constants K(r]),c(r]) > smc/i i/mt /or 
atf J > 1 

(54) P(5 5 >t,d(e 5 ,7r) > r?) < K(r})exp(-6*t - c(r))5). 
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Proof. We first note that there exists 6 > such that logtp(6) < due 
to log(/?(0) = and dglog<p(0) = dgip(0) = /j, < 0. Choose such a 9 > and let 
c = — log > — for optimality we may choose that minimizes log </?(#). 
Exponential change of measure gives 



P(r_(l) > n) = E 9/ (-gj— ; n < r_(l) < oo^) 

T_(l) 



-(1) 

< if E e/ (exp(-cr_(l) - 6S T _ {1) );n< r_(l) < oo) 

< iT exp(-cn)E 9/ (exp(-0S T _ (1) ); n < r_(l) < oo) 

< .ff exp(— ere). 

Here ifo is the maximum of the eigenvector fractions and 

K = K o E f(exp(-6S T _ (1) )), 

which is finite because > S T _m > ^^x,y f(x, y)- This shows (53). 
For the second inequality we find that 

F(S 5 > t, d(e 5 , tt) > rj) = E* S s > t, d(e s ,fr) > r^j 

< K exp(-e*t)F*(d(E S ,n) > rj). 
Large deviation theory for Markov chains gives that 

limsup ^logF*(d(e$,Tt) > n) < — inf I 2 (v), 

where the infimum is taken over all shift-invariant probability measures on 
E 2 x E 2 . Here the rate-function I 2 is continuous and I 2 {v) > for all v ^ tt. 
We refer to Definition III. 23, Theorem IV. 3 and Lemma IV. 5 in [8]. Conse- 
quently we can choose Ko(r]),c(w) > 0, with c{rj) < mf v . d ( u ^ >rj I 2 (v), such 
that for all 5 > 1 

F*(d(e 5l 7T) > V ) < K (r/)exp(-c(r ? )5). 
We conclude that (54) holds with K{rf) = K K (n). □ 

Lemma 5.15. If we, for some i£K, let 

log if* + log re 2 +£ 
(55) t = t n = 

and assume that (l n )n>i is a sequence of positive integers satisfying 

lim £~ 1 logn= lim n~ 1 l n = 0, 
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then with x n = 9*(t n — \ t n \) € [0,0*) it holds that 

^E(y a (t n ,Z n ,7 ? ))~E(C„(t n ))~exp(-x + a; n ) 

ael 

for all i] > 0. 

Proof. Introduce the probabilities 



p(n, x, y) = ¥ XjV max S s > t n = F x>y max S s > [t n 

\<5:<5<t_(1) / \5:6<T—(1) 

and 

p(n,x,y) =F xy [ max Ss>t n ). 

and d(s§ ,ty)<ti 

Furthermore, for a = € / let 

q(a, x, y) = F(T a = 0,^ = x, Yj = y). 

Using the Markov property we find that the conditional probability of the 
event (max 5:S < A ( a ) S£ a ,s(f) > t n ), conditionally on (T a = 1 X i = x,Yj = y), 
is smaller than p{n, x, y) because A(a) is restricted by the boundaries of the 
score matrix. Thus 

(56) F[T a = 0, max Se aS (f) >t n J < V p(n,x,y)q(a,x,y), 

V <5:5<A(a) J 

which by (23) gives 

E(C n (t n )) < p(n,x,y)^2q(a,x,y). 

x,y£E a£l 

With / = {(i,j) € I\i,j < n — l n } we find for a = (i,j) £ I, by conditioning 
on the event (T a = 0, Xi = x, Yj =y), that 

F(v a = i)= Yl p{ n i x iy)q{ a i x iv) 

x,y£E 

and hence 

J2E(V a )= ^ p(n,x,y)^2q(a,x,y). 

Since by construction J2aei^a < C n (t n ) we get the following chain of 
inequalities: 

x,y£E a£ j a£l 

<E(c„(i n ))< Y p( n , x >y)Yl q ( a > x >y}- 

x,ydE a£l 
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We are done once we have shown that the lower and upper bounds both 
behave as exp(— x + x n ). To this end, first note that by (19) 

p(n, x, y) exp(6»* [t n \ )^e(x,y), 

for n — > oo and as a consequence of (18), essentially considering the score 
matrix one diagonal at the time, we find that 

a&I r 

for n — > oo. This gives that 

exp(x-x n ) ^2 p( n , x >y)^2<i( a > x iy) 

x,y£E a£l 

= J^Y1 P{n,x,y)exp{e*[t n \) n~ 2 ^2q(a,x,y) 

x,y£E a£l 

^-^7— e(x,y)u(x,y) = l 

A ^" x,yeE 

for n — > oo. 

Regarding the lower bound, we observe that 

p(n,x,y) <p(n,x,y) 

< p{n, x, y) + P(r_(l) > l n ) +F(35 < l n : S 5 > t n , d(e s , fr) > 77). 

Since Z" 1 logn — > for n — > 00 we conclude from (53) that P(r_(l) > l n ) = 
o(exp(#*i n )). For the last probability on the right-hand side above we first 
observe that if Sg > t n , then necessarily 5 > ||/||^ ;t n . Thus using (54) and 
that n~ l l n — ► for n — > 00 we see that 

P(3 <J < Z n : 5 5 > t n , d(ea, vr) > r?) < l n exp(-(0* + cMH/H" 1 )*,,) 

= o(exp(0*i n )). 

Hence 

p(n, x, y) exp(6»* [t n \ ) -> e(x, y), 
for n — > 00. Since n~ l l n — > we also have that 



2V C \ K^S/) 

> .q(a,x,y) ->■ 

/J 



J ^g(a,x,y) 



for re — > 00. By an argument similar to that above 

exp(x-x n ) P( n ,x,y)^2q{a,x,y) -» 1 

for n — > 00, and this completes the proof. □ 
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Lemma 5.16. With (i re )n>l an d (ln)n>i chosen as in Lemma 5.15, as- 
suming in addition that 

lim n~ £ l n = 

n-^oo 

for all e > 0, then under the assumptions in Theorem 3.1, the conditions in 
Theorem 4.1 are fulfilled for some i] > with 



A n = exp(-x + x n ), 



that is, 



T>{^2V a (t n ,l n ,i])\ -Poi(exp(-x + x n )) 
\ael / 



0. 



Proof. We define the neighborhood of strong dependence, B a for a £ I, 
as follows. Define for a = (i,j) £ / 

Bl = {{k,m) el\\k-i\< 2l n }, B 2 a = {(k,m) G I\\m — j\ < 2l n }, 

and B a = B l a \JB 2 a . 

Note that (36) provides the bound E(V a ) < A'exp(— 6*t n ), and since |/| = 
n 2 and \B a \ < Anl n , then 

Y, ®(y a )E(V b ) < K'l n n~ l -> 

a£l,b£B a 

for n — > oo. This shows that (26) holds. 

We prove that (27) is fulfilled by splitting the set B a into three disjoint 
sets and, depending on the set, give a bound of M(V a V b ) for b in each of these 
sets. For a £ I let 

B a = C a UDlu D 2 

with C a , D\ and D 2 being the disjoint sets 

C a = BlnB 2 a , Dl = Bl\C a , D 2 a = B 2 a \C a . 

Consider the case b £C a and b^ a. Using (32) together with Lemma 5.11 
we can find an e > such that 

E(K.Vb) < l 2 n KeM-0*(l+e)t n ). 

Hence, observing that J2aei \^a\ < 16^n 2 , 



£ nVaV b ) < K'l 
a£l,b£Ca,b^a 



for n — > oo. 
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For b £ D\ use (32) together with Corollary 5.4, which applies due to (12), 
to find rj, e, K > such that 

HVaV b ) < ir#exp(-(3/2 + e)9*t n ). 

Since < Al n n we conclude that 

£ E(KT4) < K^n" 36 

for n — > oo. The same bound is obtainable for b € (cf. the comment after 
Corollary 5.4), and all in all we conclude that (27) is fulfilled. 

The two-dimensional process (X n , Y n ) n >\ is a stationary, irreducible Markov 
chain on a finite state space, hence we can extend it to a doubly infinite, sta- 
tionary process (X n ,Y n ) n€ z, which is exponentially /3-mixing. The /3-mixing 
coefficients therefore satisfy 

(3{n) < Kex.p(— 771) 

for some constants K, 7 > 0. For a = G I we define I\ = (— co,z — l n ], 
I 2 = [i + l,i + l n ], and I 3 = [i + 2l n + l,co), for which d(h U h,h) =l n + l- 
With / = h U I 3 and J = I 2 , then clearly T a <^Ti = a(X n ,Y n \n € h U 7 3 ) 
and is measurable w.r.t. JTj = <r(X n ,y^|n € 12)- By Lemmas 5.12 and 
5.13 it follows that 

E|E(K|^ a ) - E(K)| < 2a(F l3 Fj) < 6/3(Z n + 1) < K / exp(- 7 / ri ). 

For nondiagonal a = (i, j) £ I we can shift the X-process by stationarity to 
reduce the problem to the previous one and thus to obtain the same bound. 
This bound implies that 

^E\E(V a \F a ) - E(V a )\ < K'n 2 eM-lln) -> 

a£l 

for n — > 00. This shows that (28) holds, and combining the bounds obtained 
in this proof with Lemma 5.15 we see that Theorem 4.1 gives the result. □ 

Remark 5.17. We have a little flexibility left in the choice of (l n )n>i- 
It does not matter how we choose this sequence precisely, as it is only an 
intermediate, technical necessity for the proof. We just need to make sure 
that a sequence can be chosen with the desired properties — and this is indeed 
the case. 

Finishing the proof of Theorem 3.1. Having proved Lemma 5.16 
we only need to verify (24) according to Corollary 4.2. To this end we note 
that by construction J2 aG jV a < C n (t n ), hence 

P(X>a^CW(in)) <E(C n (t n ))-^EK^0 

\a€l / a£l 

for n — ► co by Lemma 5.15. □ 
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6. Concluding remarks. As mentioned in the Introduction, the result 
is a generalization of that obtained by [6] for aligning independent i.i.d. 
sequences. The overall strategy of constructing a counting variable that ap- 
proximates C n (t), and whose asymptotic behavior can be derived from [2], 
Theorem 1, is identical to the strategy employed in [6], though we have cho- 
sen an approximation whose relation to C n (t) seems more obvious. We have 
also chosen to use exponential change of measure arguments to obtain most 
of the needed inequalities, whereas Dembo, Karlin and Zeitouni [6] rely more 
on combinatorial and large deviation inequalities. 

One major challenge was to find a appropriate generalization of condition 
(E') in [6] for the i.i.d. case. First the condition given by (22) was obtained 
directly, but this condition is not able to completely retain the i.i.d. case. 
Fortunately the referees insisted that another attempt should be made to 
obtain the correct generalization of (E'). As it turned out, it is essential 
in the construction of the variables V a to require that the pair-empirical 
measure is close to tt. Although this does not affect the asymptotic behavior 
of the expectations E(V a ), it does provide bounds, as a result of Lemma 5.4, 
on the expectations E(V^Vb) that seem unobtainable otherwise. 

Another major challenge was the generalization of the part of the proof of 
Lemma 2 in [6] called case (c), where a smart permutation argument relying 
on exchangeability of i.i.d. variables was used. The solution presented here, 
which works for Markov chains, is an application of the Azuma-Hoeffding 
inequality for martingales as described in Sections 5.4 and 5.5. 

Besides this an extra argument based on mixing inequalities was needed 
in order to take care of the /34 ;n -term, which was not present in the i.i.d. 
case. 

Acknowledgment. Thanks are due to the referees for useful comments 
and for encouraging me to find a natural extension in the Markov setup of 
condition (E') in [6]. 
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